The data set that I am exploring here is : White Wine Quality, which is publicly available for research. The author of this data set are : Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV). It is related to white variants of the Portuguese “Vinho Verde” wine. No. of instances in it are - 4898 with 12 useful variables a general summary of data set has been given below.
## [1] "Frequency distribution of quality"
##
## 3 4 5 6 7 8 9
## 20 163 1457 2198 880 175 5
## [1] "Variable: fixed.acidity"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## [1] "Variable: volatile.acidity"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
## [1] "Variable: citric.acid"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
## [1] "Variable: residual.sugar"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
## [1] "Variable: chlorides"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
## [1] "Variable: free.sulfur.dioxide"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
## [1] "Variable: total.sulfur.dioxide"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
## [1] "Variable: density"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
## [1] "Variable: pH"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
## [1] "Variable: sulphates"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
## [1] "Variable: alcohol"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
2.Variables like fixed.acidity, volatile.acidity, citric.acid, free.sulfur.dioxide,total.sulfur.dioxide and alcohol seems to be follow a poisson distribution but they still have a long tail associated with them on the right side except in case of variable : alcohol.
3.The variables like pH and sulphates have a rough normal distribution.
4.The variable chrolides and residual.sugar have really long tails on the positive side.
str(white_wine)
## 'data.frame': 4898 obs. of 12 variables:
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
This data-set have information about 4898 variants of wine of same brand. Each wine have 12 varaibles associated with it.
The main feature of interest to me in this data set is the quality assocoated with the wine. I want to understand how the other independent variables are related to the Quality of a wine.
print("Correlation between variables and quality")
## [1] "Correlation between variables and quality"
abs(round(cor(white_wine),3))[-12,"quality"]
## fixed.acidity volatile.acidity citric.acid
## 0.114 0.195 0.009
## residual.sugar chlorides free.sulfur.dioxide
## 0.098 0.210 0.008
## total.sulfur.dioxide density pH
## 0.175 0.307 0.099
## sulphates alcohol
## 0.054 0.436
As we can see that individually the correlation of variables with the quality is weak so it will be required to use them at the same time in predicting the quality.
Yes, one
white_wine$qlesseqFive <- white_wine$quality <= 5
For ease of prediction I am converting the quality variable in to categorical feature by converting it into a factor varaible.
white_wine$qualety <- factor(white_wine$quality)
## [1] "Summary of fixed.acidity By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.200 6.575 7.300 7.600 8.525 11.800
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.800 6.400 6.900 7.129 7.600 10.200
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.500 6.400 6.800 6.934 7.400 10.300
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.838 7.300 14.200
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.200 6.200 6.700 6.735 7.200 9.200
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.900 6.200 6.800 6.657 7.300 8.200
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.60 6.90 7.10 7.42 7.40 9.10
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 45 45.05 64.08 1.48e-15 ***
## Residuals 4896 3442 0.70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of volatile.acidity By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1700 0.2375 0.2600 0.3332 0.4125 0.6400
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1100 0.2700 0.3200 0.3812 0.4600 1.1000
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.240 0.280 0.302 0.340 0.905
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2000 0.2500 0.2606 0.3000 0.9650
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.1900 0.2500 0.2628 0.3200 0.7600
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.2000 0.2600 0.2774 0.3300 0.6600
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.240 0.260 0.270 0.298 0.360 0.360
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 1.89 1.8864 193 <2e-16 ***
## Residuals 4896 47.86 0.0098
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of citric.acid By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2100 0.2575 0.3450 0.3360 0.3850 0.4700
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1900 0.2900 0.3042 0.4000 0.8800
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2400 0.3200 0.3377 0.4100 1.0000
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.270 0.320 0.338 0.380 1.660
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0100 0.2800 0.3100 0.3256 0.3600 0.7400
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0400 0.2800 0.3200 0.3265 0.3600 0.7400
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.290 0.340 0.360 0.386 0.450 0.490
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 0.01 0.006082 0.415 0.519
## Residuals 4896 71.71 0.014648
## [1] "Summary of residual.sugar By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.588 4.600 6.392 10.700 16.200
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.300 2.500 4.628 7.100 17.550
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.800 7.000 7.335 11.500 23.500
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.700 5.300 6.442 9.900 65.800
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.700 3.650 5.186 7.325 19.250
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.800 2.100 4.300 5.671 8.200 14.800
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.60 2.00 2.20 4.12 4.20 10.60
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 1199 1199.5 47.06 7.72e-12 ***
## Residuals 4896 124780 25.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of chlorides By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02200 0.03625 0.04100 0.05430 0.05400 0.24400
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0130 0.0380 0.0460 0.0501 0.0540 0.2900
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.04000 0.04700 0.05155 0.05300 0.34600
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01500 0.03600 0.04300 0.04522 0.04900 0.25500
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.03100 0.03700 0.03819 0.04400 0.13500
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01400 0.03000 0.03600 0.03831 0.04400 0.12100
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0180 0.0210 0.0310 0.0274 0.0320 0.0350
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 0.103 0.10302 225.7 <2e-16 ***
## Residuals 4896 2.235 0.00046
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of free.sulfur.dioxide By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 13.25 33.50 53.32 47.50 289.00
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 9.00 18.00 23.36 30.50 138.50
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 22.00 35.00 36.43 50.00 131.00
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 24.00 34.00 35.65 46.00 112.00
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 25.00 33.00 34.13 41.00 108.00
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 28.00 35.00 36.72 44.50 105.00
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.0 27.0 28.0 33.4 31.0 57.0
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 94 94.27 0.326 0.568
## Residuals 4896 1416327 289.28
## [1] "Summary of total.sulfur.dioxide By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.0 105.8 159.5 170.6 210.0 440.0
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 85.0 117.0 125.3 171.5 272.0
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 121.0 151.0 150.9 182.0 344.0
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 107.2 132.0 137.0 164.0 294.0
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.0 101.0 122.0 125.1 144.2 229.0
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 59.0 102.5 122.0 126.2 150.0 212.5
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 85 113 119 116 124 139
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 270047 270047 154.2 <2e-16 ***
## Residuals 4896 8574354 1751
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of density By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9911 0.9925 0.9944 0.9949 0.9969 1.0000
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9892 0.9926 0.9941 0.9943 0.9958 1.0000
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9872 0.9933 0.9953 0.9953 0.9972 1.0020
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9876 0.9917 0.9937 0.9940 0.9959 1.0390
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9906 0.9918 0.9925 0.9937 1.0000
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9903 0.9916 0.9922 0.9935 1.0010
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9896 0.9898 0.9903 0.9915 0.9906 0.9970
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 0.00413 0.004132 509.9 <2e-16 ***
## Residuals 4896 0.03967 0.000008
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of pH By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.870 3.035 3.215 3.188 3.325 3.550
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.830 3.070 3.160 3.183 3.280 3.720
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.790 3.080 3.160 3.169 3.240 3.790
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.080 3.180 3.189 3.280 3.810
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.840 3.100 3.200 3.214 3.320 3.820
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.940 3.120 3.230 3.219 3.330 3.590
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.200 3.280 3.280 3.308 3.370 3.410
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 1.1 1.1038 48.88 3.08e-12 ***
## Residuals 4896 110.5 0.0226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of sulphates By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2800 0.3800 0.4400 0.4745 0.5425 0.7400
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2500 0.3800 0.4700 0.4761 0.5400 0.8700
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2700 0.4200 0.4700 0.4822 0.5300 0.8800
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.4100 0.4800 0.4911 0.5500 1.0600
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4800 0.5031 0.5800 1.0800
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2500 0.3800 0.4600 0.4862 0.5850 0.9500
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.360 0.420 0.460 0.466 0.480 0.610
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 0.18 0.18378 14.15 0.000171 ***
## Residuals 4896 63.60 0.01299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Summary of alcohol By quality"
## white_wine$qualety: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.55 10.45 10.34 11.00 12.60
## --------------------------------------------------------
## white_wine$qualety: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.40 10.10 10.15 10.75 13.50
## --------------------------------------------------------
## white_wine$qualety: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.000 9.200 9.500 9.809 10.300 13.600
## --------------------------------------------------------
## white_wine$qualety: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.60 10.50 10.58 11.40 14.00
## --------------------------------------------------------
## white_wine$qualety: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.60 10.60 11.40 11.37 12.30 14.20
## --------------------------------------------------------
## white_wine$qualety: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 11.00 12.00 11.64 12.60 14.00
## --------------------------------------------------------
## white_wine$qualety: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 12.40 12.50 12.18 12.70 12.90
## [1] "One-way ANOVA test"
## Df Sum Sq Mean Sq F value Pr(>F)
## quality 1 1407 1407.0 1146 <2e-16 ***
## Residuals 4896 6009 1.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Correlation matrix for independent variables"
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.000 0.023 0.289
## volatile.acidity 0.023 1.000 0.149
## citric.acid 0.289 0.149 1.000
## residual.sugar 0.089 0.064 0.094
## chlorides 0.023 0.071 0.114
## free.sulfur.dioxide 0.049 0.097 0.094
## total.sulfur.dioxide 0.091 0.089 0.121
## density 0.265 0.027 0.150
## pH 0.426 0.032 0.164
## sulphates 0.017 0.036 0.062
## alcohol 0.121 0.068 0.076
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.089 0.023 0.049
## volatile.acidity 0.064 0.071 0.097
## citric.acid 0.094 0.114 0.094
## residual.sugar 1.000 0.089 0.299
## chlorides 0.089 1.000 0.101
## free.sulfur.dioxide 0.299 0.101 1.000
## total.sulfur.dioxide 0.401 0.199 0.616
## density 0.839 0.257 0.294
## pH 0.194 0.090 0.001
## sulphates 0.027 0.017 0.059
## alcohol 0.451 0.360 0.250
## total.sulfur.dioxide density pH sulphates alcohol
## fixed.acidity 0.091 0.265 0.426 0.017 0.121
## volatile.acidity 0.089 0.027 0.032 0.036 0.068
## citric.acid 0.121 0.150 0.164 0.062 0.076
## residual.sugar 0.401 0.839 0.194 0.027 0.451
## chlorides 0.199 0.257 0.090 0.017 0.360
## free.sulfur.dioxide 0.616 0.294 0.001 0.059 0.250
## total.sulfur.dioxide 1.000 0.530 0.002 0.135 0.449
## density 0.530 1.000 0.094 0.074 0.780
## pH 0.002 0.094 1.000 0.156 0.121
## sulphates 0.135 0.074 0.156 1.000 0.017
## alcohol 0.449 0.780 0.121 0.017 1.000
1.volatile.acidity, density and pH tends to decrease as the quality of wine increases.
2.citric.acid, sulphates and alcohol increases with the quality of wine.
3.fixed.acidity, residual.sugar and chlorides seems to be stagnat incomparison with the change in wine quality.
4.free.sulfur.dioxide and total.sulfur.dioxid is lower in case of both low and high quality wines. But in case of wines which fall in between the these variables shows higher values.
On aiming for a higher quality of wine we can observe that the variables like volatile.acidity, density, pH, citric.acid, sulphates, alcohol changes.
The most interesting relationship that I seem to find is that of density and residual.sugar with a correlation of 0.839
The strongest relationship that I found was that of residual.sugar and density with the correlation coefficient being 0.839 which is the highest among any pssible pair of variables.
Aim of generating these plots is to find the right combination of variables that taht can help us in distinguishing wines with quality less than or equal to five with thos having higher quality.
From the plots we can infer that the combination of sulphates and alcohol, the combination of chlorides and alcohol, the combination of volate.acidity and alcohol, and the combination of volatile.acidity and sulphates seem to able to help us distinguish wines with higher quality and wines with lower quality less than or equal to five.
Even though free.sulfur.dioxide and total.sulfur.dioxide are moderately correlated with each other, from the plots we can see that , many low quality wine tend to have higher value of total.sulfur.dioxide for a given value of free.sulfur.dioxide. So, these two variables can together provide some resoning for wine quality.
The range of quality from 0 to 5 , 6 and above divides the data set in two chunks of roughly same size and the most no. of wines lying in the qujality region of 5 and 6
These plots shows us that the median values of variableS: sulphates, alcohol, citric.acid increases as the quality of wine increases.
We use two varaibles though individually they have weak correlation in combinations of two to have a line that can separate wine with quality less tha or equal to five with those having higher quality.
In this data set I tried to find out how the quality of a wine is related to it’s differnet properties, but it was frustrating for a moment to see that a variable individually cannot do justice in telling about the wine quality.
We can use different variables in combination to have a better grasp of the wine quality.
One thing that I did not like about this data set was that quality was the only variable that seemed to me somewhat explorable. It would have been great if a factor of price or other economical values were also present in this data set.